Using Lexical and Logical Methods for the Alignment of Medical Terminologies

نویسندگان

  • Michel C. A. Klein
  • Zharko Aleksovski
چکیده

Standardized medical terminologies are often used for the registration of patient data. In several situations there is a need to align these terminologies to other terminologies. Even when the terminologies cover the same domain, this is often a non-trivial task. The task is even more complicated when the terminology does not contain much structure. In this paper we describe the initial results of a procedure for mapping a terminology with little or no structure to a structure-rich terminology. This procedure uses the knowledge of the structure-rich terminology and a method for semantic explicitation of concept descriptions. The first results shows that, when compared to approaches based on syntactic analysis only, the recall can be greatly improved without sacrificing much of the precision. 1 Terminology Alignment for Non-Structured Lists The use of standardized terminologies for the registration of patient data is a very common practice in modern health-care. There can be many reasons to link (part of) the patient data to other data sources, e.g. for registration purposes, to find relevant literature references, to compare hospitals, and so on. These tasks often involve the alignment of the used terminology with another terminology. There already exist several techniques for ontology3 alignment [2–5]. However, the existing techniques are most effective when the ontology contains structure. In practice there are quite some terminologies in use that have no or only minor structure (e.g. a controlled list of terms). This often implies that the alignment has to be based on lexical techniques only. In this paper, we report on a method for aligning a flat list of terms with a structure-rich ontology. In this method, we use the knowledge in the ontology as background knowledge. The results of the method are suggestions for mapping candidates to users that perform an ontology alignment task. In the next section, we briefly describe the two terminologies that are used in our procedure. In Section 3, we describe the alignment procedure itself and we explain our choices. Section 4 presents the first results, and in the last section we discuss our approach and look forward. 3 In this paper we do not make a distinction between a structured terminology system and an ontology and use the terms intermixed. See [1] for a more detailed positioning of both. 2 Description of Terminologies There are two terminologies used in the experiments. The first terminology is a list of “problems” related to patients at an intensive care unit (ICU), developed in a local hospital. This list is partly based on the ICD-9-cm4. During its use in the past three years the list has been extended with additional descriptions of medical conditions of patients at the ICU. The resulting list is a mixture of problem descriptions at several levels of abstraction with minor redundancy. The list is mainly in Dutch but also contains some English terms. In the remainder of this paper we will refer to this terminology as the local hospital list. This list contains 1399 problem descriptions consisting of maximal 7 words. 95% of the descriptions consists of three words or less. The second terminology is a structured ICU ontology, developed at the Academic Medical Center in Amsterdam [6]. The ontology defines around 1460 ‘reasons for admission’ at the ICU. These concepts are defined via five axes, i.e. ‘tractus’, ‘abnormality’, ‘activity’, ‘anatomical location’ and ‘aetiology’. The concepts in the ontology are related to at least one term and often to more. This ontology can be seen as a combination of two distinct parts: the taxonomy of ‘reasons for admission’ itself and an ontology with background knowledge about anatomy, abnormalities etc. 3 Alignment Approach The goal of the approach is to map the unstructured descriptions in the local hospital list to concepts that are defined in ‘reason for admission’ subtree in the ontology. 1. Apply lexical mapping techniques to match terms in the problem descriptions to terms from the five defining axes in the ontology (i.e. the background knowledge). 2. Translate problem descriptions from the local hospital list into logical formulas. 3. Create logical formulas for ontology concepts using both its term and the terms from the five defining axes. 4. Classify formulas from the local hospital list in the ontology. The first step results in an annotated version of the local hospital list, in which each term in the descriptions is labelled with the name of the axis in the ontology in which the term or a synonym of it is found. For example, the description colon carcinoom is not directly found in the ontology. However, the term colon is found in the ‘anatomical location’ subtree, while a synonym of the term ‘carcinoom’ is found the ‘abnormality’ subtree. The resulting annotation is colon carcinoom. Not all terms can be found in the ontology, so there will also be unlabelled terms. For the classification, we apply an approximate matching method that is described in [7]. The idea behind this method is to transform the concepts from two ontologies into propositional formulas and then to discover matchings by finding equivalence or subsumption relations between the formulas of the different ontologies. The transformation of the concepts into formulas consists of two parts: transforming the natural language descriptions into logical formulas—semantic explicitation, and using the relations among the concepts in the ontologies—contextualization. 4 http://www.cdc.gov/nchs/about/otheract/icd9/abticd9.htm The semantic explicitation is intended to capture the meaning of the natural language descriptions into logic formulas. In some cases, certain word combinations will make no sense and should be discarded. The idea behind the contextualization is to make the knowledge that is encoded in the structure of the ontology explicit. For example, a concept can be represented as a disjunction of its preferred term and all its synonyms; or it can be represented in conjunction with its superconcepts, since superconcepts are naturally assumed to have broader meaning. In our experiment, the logical formulas for the descriptions in the local hospital list are formed by creating a disjunction of all possible interpretations of the word combinations that are likely to make sense. For example, the term abdominal aneurysm aorta is translated into the following formula: (abdominal ∩ aneurysm ∩ aorta) ∪ (abdominal aneurysm ∩ aorta) ∪ (abdominal ∩ aneurysm aorta) ∪ (abdominal aneurysm aorta) This encodes all different interpretation of the description in which subsequent terms can form one composite term or separate terms. The ‘reasons for admission’ concepts in the ontology (only these concepts, not the terms from the five defining axes) are translated by taking the disjunction of all terms and the conjunction of the values of the attributes that are defined on them. For example, the concept Pneumonia with synonym longontsteking, abnormality infection and location lungs will be translated into: (pneumonia ∪ longontsteking) ∪ (infection ∩ lungs). The intuition behind this translation is that the meaning of the concept is either completely captured in one of it terms, or in the combination of its defining attributes. If preferred terms or synonyms consist of several words, they are translated in a similar way as the descriptions in the local hospital list, encoding all possible different interpretations. Subsumption and equivalence is now decided by comparing the formulas from both terminologies. This is possible because they partly use the same terms, i.e. the terms from the five defining axes (the background knowledge). The ontology concepts were directly defined using these terms, while the local hospital list is mapped to the (synonyms of the) terms in the background knowledge.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A system for automated lexical mapping.

OBJECTIVE To automate the mapping of disparate databases to standardized medical vocabularies. BACKGROUND Merging of clinical systems and medical databases, or aggregation of information from disparate databases, frequently requires a process whereby vocabularies are compared and similar concepts are mapped. DESIGN Using a normalization phase followed by a novel alignment stage inspired by ...

متن کامل

Interoperability between phenotypes in research and healthcare terminologies—Investigating partial mappings between HPO and SNOMED CT

BACKGROUND Identifying partial mappings between two terminologies is of special importance when one terminology is finer-grained than the other, as is the case for the Human Phenotype Ontology (HPO), mainly used for research purposes, and SNOMED CT, mainly used in healthcare. OBJECTIVES To investigate and contrast lexical and logical approaches to deriving partial mappings between HPO and SNO...

متن کامل

Automatic methods for mapping Biomedical terminologies in a Health Multi-Terminology Portal

Terminology mapping is an important and crucial task to improve semantic interoperability between health care applications and resources. In 2009, CISMeF created a Health Multi-Terminological Portal (HMTP) to search concepts among all the health terminologies available in French (or in English and translated in French) included in this portal and to browse it dynamically. To map terminologies i...

متن کامل

Evaluating alignment quality between iconic language and reference terminologies using similarity metrics

BACKGROUND Visualization of Concepts in Medicine (VCM) is a compositional iconic language that aims to ease information retrieval in Electronic Health Records (EHR), clinical guidelines or other medical documents. Using VCM language in medical applications requires alignment with medical reference terminologies. Alignment from Medical Subject Headings (MeSH) thesaurus and International Classifi...

متن کامل

Coverage of Phenotypes in Standard Terminologies

Objective: To assess the coverage of the Human Phenotype Ontology (HPO) phenotypes in standard terminologies. Methods: We map HPO terms to the UMLS and its source terminologies and compare these lexical mappings to HPO cross-references. Results: Coverage of HPO classes in UMLS is 54% and 30% in SNOMED CT. Lexical mappings largely outnumber cross-references. Conclusions: Our approach can support...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005